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1  statement  of  Problem  Studied 

We  have  studied  the  problem  of  eiEcient  execution  of  programs  on  paral¬ 
lel  computers  —  and  on  the  B-HIVE  architecture  in  particular  —  and  the 
software  support  required,  concentrating  on  programming  language  imple¬ 
mentation. 

In  a  loosely-coupled  multiprocessing  environment,  in  which  processors 
communicate  over  a  generalized  hypercube  or  similar  network,  the  cost  of 
sharing  data  among  processes  is  quite  high.  We  have  investigated  techniques 
of  minimizing  the  total  communication  cost  in  parallel  programs.  We  have 
also  looked  into  ways  of  introducing  parallelism  in  combanitorial  problems 
and  observed  the  impact  of  randomization  on  the  system  speedup.  Besides 
the  speedup,  the  network  cormectivity  and  data  distribution  in  parallel  Sc 
distributed  systems  play  a  very  important  role  in  determining  the  system 
performance.  We  have  computed  the  reliability  of  such  systems  under  dif¬ 
ferent  environment.  In  addition,  we  have  studied  the  design  of  topologies 
with  limited  connections  and  which  could  be  appropriate  for  both  LAN  Sc 
MAN  applications. 


2  Summary  of  the  most  important  results 


We  are  interested  in  techniques  for  arranging  parallel  code  onto  loosely- 
couple  processors  so  as  to  minimize  communication  overhead  and  maximize 
processing  speed.  This  is  a  synthesis  problem,  similar  to  the  problem  of  op¬ 
timization  and  code-generation  in  a  traditional  compiler.  It  is  very  different 
from  the  analysis  problem  of  discovering  potentially  parallel  operations  in 
the  program.  Much  previous  research  has  centered  on  designing  program¬ 
ming  languages  that  express  parallelism  explicitly,  and  on  program  rmalyzers 
that  discover  implicit  parallelism  in  sequential  codes.  Our  work  is  indepen¬ 
dent  of  these  issues  and  compatible  with  either  approach. 

We  use  a  medium  grain  parallelism  model  to  minimize  communication 
overhead  [1-10].  A  medium  grain  model  is  shown  to  be  an  optimum  way  of 
merging  fine  grain  operations  into  parallel  tasks  such  that  the  parallelism 


obtained  at  the  small  grain  level  is  retcuned  and  communication  overhead  is  _ 

decreased.  Our  “vertical  partitioning”  and  scheduling  techniques  have  been 
evaluated  by  the  simulation  of  ten  EISPACK  subroutines  [7].  The  verti- 
cal  partitioning  model  clearly  outperforms  the  model  without  the  vertical 
partitioning.  louiced 
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A  new  communication  model  has  been  introduced  [1],  allowing  addi¬ 
tional  overlap  between  computation  and  communication.  Simulation  results 
indicate  [8]  that  the  medium  gr^  communication  model  shows  promise  for 
automatic  parallelization  for  a  loosely-coupled  multiprocessor  system. 

Since  most  of  the  computation  in  a  program  is  performed  inside  loops, 
parallelization  of  loop  structiues  is  an  important  topic  and  has  been  exten¬ 
sively  studied  in  the  past.  The  particular  program  of  loop  execution  on  a 
loosely-coupled  parallel  processor  has  not  received  much  attention,  however. 
We  introduce  a  compensated  loop  scheduling  that  adjusts  to  the  delays  in¬ 
volved  in  distributed  data  for  parallel  execution.  We  also  propose  a  nested 
loop  scheduling  that  allows  heterogeneous  loop  allocation  [11].  Heteroge¬ 
neous  loop  allocation  adjusts  to  dependencies  within  loops  and  in  some 
cases  permits  better  utilization  of  processors  in  the  inner  loops.  Additional 
experimentation  [12]  has  been  done  to  do  parallel  recursive  least  suuares 
computation  in  distributed  memory  multiprocessors.  All  these  problems 
have  been  studied  for  efficient  execution  of  programs  on  parallel  computers 
in  general,  and  on  the  B-HTVE  architecture  [13,14]  in  particular.  We  have 
shown  that  for  some  optinaization  problenos,  such  as  backtracking  [15],  and 
branch  and  bound  [16],  a  randomized  search  algorithm  yields  high  perfor¬ 
mance  on  a  parallel  processor.  A  major  advantage  of  randomized  search  is 
that  very  little  interprocessor  communication  is  required. 

There  are  many  other  factors  that  influence  the  performance  of  a  parallel 
system.  The  impact  of  single  faults  on  the  system  performance  of  a  class 
of  cluster-based  multiprocessors  have  been  considered  in  [17].  The  effect  of 
network  connectivity  and  data  distribution  on  various  reliability  and  per¬ 
formance  parameters  have  been  studied  in  [18-21].  A  comparison  of  various 
fault-tolerant  multistage  interconnection  networks,  has  been  performed  in 
[22,23].  The  requirements  of  LANs  and  MANs  in  terms  of  connectivity,  are 
different  than  a  general  parallel  systems  and  appropriate  topologies  for  such 
doubly  connected  multidimensional  networks  and  their  characteristics,  have 
been  covered  in  [24-27].  Thus,  the  research  results  encompasses  more  than 
the  areas  just  described  in  the  proposal. 
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